Learning Character-level Representations for Part-of-Speech Tagging

نویسندگان

  • Cícero Nogueira dos Santos
  • Bianca Zadrozny
چکیده

Distributed word representations have recently been proven to be an invaluable resource for NLP. These representations are normally learned using neural networks and capture syntactic and semantic information about words. Information about word morphology and shape is normally ignored when learning word representations. However, for tasks like part-of-speech tagging, intra-word information is extremely useful, specially when dealing with morphologically rich languages. In this paper, we propose a deep neural network that learns character-level representation of words and associate them with usual word representations to perform POS tagging. Using the proposed approach, while avoiding the use of any handcrafted feature, we produce stateof-the-art POS taggers for two languages: English, with 97.32% accuracy on the Penn Treebank WSJ corpus; and Portuguese, with 97.47% accuracy on the Mac-Morpho corpus, where the latter represents an error reduction of 12.2% on the best previous known result.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Chinese Morphological Analysis with Character-level POS Tagging

The focus of recent studies on Chinese word segmentation, part-of-speech (POS) tagging and parsing has been shifting from words to characters. However, existing methods have not yet fully utilized the potentials of Chinese characters. In this paper, we investigate the usefulness of character-level part-of-speech in the task of Chinese morphological analysis. We propose the first tagset designed...

متن کامل

سیستم برچسب گذاری اجزای واژگانی کلام در زبان فارسی

Abstract: Part-Of-Speech (POS) tagging is essential work for many models and methods in other areas in natural language processing such as machine translation, spell checker, text-to-speech, automatic speech recognition, etc. So far, high accurate POS taggers have been created in many languages. In this paper, we focus on POS tagging in the Persian language. Because of problems in Persian POS t...

متن کامل

bot.zen @ EVALITA 2016 - A minimally-deep learning PoS-tagger (trained for Italian Tweets)

English. This article describes the system that participated in the POS tagging for Italian Social Media Texts (PoSTWITA) task of the 5th periodic evaluation campaign of Natural Language Processing (NLP) and speech tools for the Italian language EVALITA 2016. The work is a continuation of Stemle (2016) with minor modifications to the system and different data sets. It combines a small assertion...

متن کامل

Character-based Joint Segmentation and POS Tagging for Chinese using Bidirectional RNN-CRF

We present a character-based model for joint segmentation and POS tagging for Chinese. The bidirectional RNN-CRF architecture for general sequence tagging is adapted and applied with novel vector representations of Chinese characters that capture rich contextual information and sub-character level features. The proposed model is extensively evaluated and compared with a state-of-the-art tagger ...

متن کامل

Character Word Embedding for NLP tasks in Indian Languages

In the recent time Word Embeddings have been used as unsupervised approach to achieve results comparable to that of supervised methods which use handcrafted features.But information about word morphology and shape is normally ignored when learning word representations.Character level embedding can capture the intra-word information specially when dealing with morphologically rich languages. Her...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014